深度学习已经变得过于复杂,并且在解决图像分类,对象检测等若干古典问题方面享有恒星的成功。已经提出了几种解释这些决定的方法。由于它们不利用模型的内部来解释该决定,为生成显着性图产生显着性图的方法特别感到很有趣。大多数黑匣子方法扰乱了输入并观察输出的变化。我们将显着的图形制定为顺序搜索问题,并利用加强学习(RL)来累积来自输入图像的证据,最强烈地支持分类器的决策。这种战略鼓励智能地搜索扰动,这将导致高质量的解释。虽然成功的黑匣子解释方法需要依靠重计算并遭受小的样本近似,但我们的方法学到的确定性政策使得在推理期间更有效。三个基准数据集的实验证明了在不损害性能的情况下推动了推理时间的提议方法的优越性。项目页面:https://cvir.github.io/projects/rexl.html
translated by 谷歌翻译
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
Quantitative cephalometric analysis is the most widely used clinical and research tool in modern orthodontics. Accurate localization of cephalometric landmarks enables the quantification and classification of anatomical abnormalities, however, the traditional manual way of marking these landmarks is a very tedious job. Endeavours have constantly been made to develop automated cephalometric landmark detection systems but they are inadequate for orthodontic applications. The fundamental reason for this is that the amount of publicly available datasets as well as the images provided for training in these datasets are insufficient for an AI model to perform well. To facilitate the development of robust AI solutions for morphometric analysis, we organise the CEPHA29 Automatic Cephalometric Landmark Detection Challenge in conjunction with IEEE International Symposium on Biomedical Imaging (ISBI 2023). In this context, we provide the largest known publicly available dataset, consisting of 1000 cephalometric X-ray images. We hope that our challenge will not only derive forward research and innovation in automatic cephalometric landmark identification but will also signal the beginning of a new era in the discipline.
translated by 谷歌翻译
We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary viewpoints, body poses, and lighting. We only require a short video clip of the person to create the avatar and assume no knowledge about the lighting environment. We present a novel framework to model humans while disentangling their geometry, texture, and also lighting environment from monocular RGB videos. To simplify this otherwise ill-posed task we first estimate the coarse geometry and texture of the person via SMPL+D model fitting and then learn an articulated neural representation for photorealistic image generation. RANA first generates the normal and albedo maps of the person in any given target body pose and then uses spherical harmonics lighting to generate the shaded image in the target lighting environment. We also propose to pretrain RANA using synthetic images and demonstrate that it leads to better disentanglement between geometry and texture while also improving robustness to novel body poses. Finally, we also present a new photorealistic synthetic dataset, Relighting Humans, to quantitatively evaluate the performance of the proposed approach.
translated by 谷歌翻译
Denoising diffusion models hold great promise for generating diverse and realistic human motions. However, existing motion diffusion models largely disregard the laws of physics in the diffusion process and often generate physically-implausible motions with pronounced artifacts such as floating, foot sliding, and ground penetration. This seriously impacts the quality of generated motions and limits their real-world application. To address this issue, we present a novel physics-guided motion diffusion model (PhysDiff), which incorporates physical constraints into the diffusion process. Specifically, we propose a physics-based motion projection module that uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically-plausible motion. The projected motion is further used in the next diffusion step to guide the denoising diffusion process. Intuitively, the use of physics in our model iteratively pulls the motion toward a physically-plausible space. Experiments on large-scale human motion datasets show that our approach achieves state-of-the-art motion quality and improves physical plausibility drastically (>78% for all datasets).
translated by 谷歌翻译
Data scarcity is a notable problem, especially in the medical domain, due to patient data laws. Therefore, efficient Pre-Training techniques could help in combating this problem. In this paper, we demonstrate that a model trained on the time direction of functional neuro-imaging data could help in any downstream task, for example, classifying diseases from healthy controls in fMRI data. We train a Deep Neural Network on Independent components derived from fMRI data using the Independent component analysis (ICA) technique. It learns time direction in the ICA-based data. This pre-trained model is further trained to classify brain disorders in different datasets. Through various experiments, we have shown that learning time direction helps a model learn some causal relation in fMRI data that helps in faster convergence, and consequently, the model generalizes well in downstream classification tasks even with fewer data records.
translated by 谷歌翻译
National Health and Nutritional Status Survey (NHANSS) is conducted annually by the Ministry of Health in Negara Brunei Darussalam to assess the population health and nutritional patterns and characteristics. The main aim of this study was to discover meaningful patterns (groups) from the obese sample of NHANSS data by applying data reduction and interpretation techniques. The mixed nature of the variables (qualitative and quantitative) in the data set added novelty to the study. Accordingly, the Categorical Principal Component (CATPCA) technique was chosen to interpret the meaningful results. The relationships between obesity and the lifestyle factors like demography, socioeconomic status, physical activity, dietary behavior, history of blood pressure, diabetes, etc., were determined based on the principal components generated by CATPCA. The results were validated with the help of the split method technique to counter verify the authenticity of the generated groups. Based on the analysis and results, two subgroups were found in the data set, and the salient features of these subgroups have been reported. These results can be proposed for the betterment of the healthcare industry.
translated by 谷歌翻译
语言,视觉和多模式预审查的大量融合正在出现。在这项工作中,我们介绍了通用多模式基础模型BEIT-3,该模型BEIT-3,该模型在视觉和视觉任务上都实现了最新的转移性能。具体来说,我们从三个方面提出了大融合:骨干架构,预训练任务和模型扩展。我们介绍了多道路变压器进行通用建模,其中模块化体系结构可以实现深融合和模态特定的编码。基于共享的骨干,我们以统一的方式对图像(Imglish),文本(英语)和图像文本对(“平行句子”)进行蒙面的“语言”建模。实验结果表明,BEIT-3在对象检测(COCO),语义分割(ADE20K),图像分类(Imagenet),视觉推理(NLVR2),视觉询问答案(VQAV2),图像字幕上获得最先进的性能(可可)和跨模式检索(Flickr30k,可可)。
translated by 谷歌翻译
滚动轴承是旋转机械的最关键组成部分。及时识别有缺陷的轴承可能会阻止整个机械系统的故障。由于机器零件的快速发展,机械状况监测场已进入大数据阶段。当使用大量数据时,手动特征提取方法的缺点是效率低下和不准确。近年来,诸如深度学习方法之类的数据驱动方法已成功用于机械智能故障检测。卷积神经网络(CNN)主要用于早期研究中,以检测和识别轴承断层。但是,CNN模型遭受了难以管理故障时间信息的缺点,这导致缺乏分类结果。在这项研究中,使用最先进的视觉变压器(VIT)对轴承缺陷进行了分类。使用Case Western Reserve University(CWRU)实验室实验数据对轴承缺陷进行了分类。该研究还考虑了除正常轴承条件外,在0负载情况下的13种不同类型的缺陷。使用短时傅立叶变换(STFT),将振动信号转换为2D时频图像。 2D时频图像用作VIT的输入参数。该模型的总体准确度为98.8%。
translated by 谷歌翻译
在视频中,人类的行为是三维(3D)信号。这些视频研究了人类行为的时空知识。使用3D卷积神经网络(CNN)研究了有希望的能力。 3D CNN尚未在静止照片中为其建立良好的二维(2D)等效物获得高输出。董事会3D卷积记忆和时空融合面部训练难以防止3D CNN完成非凡的评估。在本文中,我们实施了混合深度学习体系结构,该体系结构结合了Stip和3D CNN功能,以有效地增强3D视频的性能。实施后,在每个时空融合圈中进行训练的较详细和更深的图表。训练模型在处理模型的复杂评估后进一步增强了结果。视频分类模型在此实现模型中使用。引入了使用深度学习的多媒体数据分类的智能3D网络协议,以进一步了解人类努力中的时空关联。在实施结果时,著名的数据集(即UCF101)评估了提出的混合技术的性能。结果击败了提出的混合技术,该混合动力技术基本上超过了最初的3D CNN。将结果与文献的最新框架进行比较,以识别UCF101的行动识别,准确度为95%。
translated by 谷歌翻译